NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory

https://doi.org/10.1109/FCCM51124.2021.00026

Jiang, Chao; Ojika, David; Patel, Bhavesh; Lam, Herman (May 2021, IEEE Annual International Symposium on Field-Programmable Custom Computing Machines)
null (Ed.)
Large Convolutional Neural Networks (CNNs) are often pruned and compressed to reduce the amount of parameters and memory requirement. However, the resulting irregularity in the sparse data makes it difficult for FPGA accelerators that contains systolic arrays of Multiply-and-Accumulate (MAC) units, such as Intel’s FPGA-based Deep Learning Accelerator (DLA), to achieve their maximum potential. Moreover, FPGAs with low-bandwidth off-chip memory could not satisfy the memory bandwidth requirement for sparse matrix computation. In this paper, we present 1) a sparse matrix packing technique that condenses sparse inputs and filters before feeding them into the systolic array of MAC units in the Intel DLA, and 2) a customization of the Intel DLA which allows the FPGA to efficiently utilize a high bandwidth memory (HBM2) integrated in the same package. For end-to-end inference with randomly pruned ResNet-50/MobileNet CNN models, our experiments demonstrate 2.7x/3x performance improvement compared to an FPGA with DDR4, 2.2x/2.1x speedup against a server-class Intel SkyLake CPU, and comparable performance with 1.7x/2x power efficiency gain as compared to an NVidia V100 GPU.
more » « less
Full Text Available
Accelerating Scientific Discovery with SCAIGATE Science Gateway

https://doi.org/10.1109/eScience.2019.00085

Jiang, Chao; Ojika, David; Patel, Bhavesh; Gordon-Ross, Ann; Lam, Herman (September 2019, 15th International Conference on eScience (eScience))

SCAIGATE is an ambitious project to design the first AI-centric science gateway based on field-programmable gate arrays (FPGAs). The goal is to democratize access to FPGAs and AI in scientific computing and related applications. When completed, the project will enable the large-scale deployment and use of machine learning models on AI-centric FPGA platforms, allowing increased performance-efficiency, reduced development effort, and customization at unprecedented scale, all while simplifying ease-of-use in science domains which were previously AI-lagging.
more » « less
Full Text Available
Research Opportunities in Heterogeneous Computing for Machine Learning

https://doi.org/10.1109/HPCS.2018.00094

Lam, Herman; Ojika, David (July 2018, International Workshop on High Performance and Dynamic Reconfigurable Systems and Networks (DRSN))

In recent times, AI and deep learning have witnessed explosive growth in almost every subject involving data. Complex data analyses problems that took prolonged periods, or required laborious manual effort, are now being tackled through AI and deep-learning techniques with unprecedented accuracy. Machine learning (ML) using Convolutional Neural Networks (CNNs) has shown great promise for such applications. However, traditional CPU-based sequential computing no longer can meet the requirements of mission-critical applications which are compute-intensive and require low latency. Heterogeneous computing (HGC), with CPUs integrated with accelerators such as GPUs and FPGAs, offers unique capabilities to accelerate CNNs. In this presentation, we will focus on using FPGA-based reconfigurable computing to accelerate various aspects of CNN. We will begin with the current state of the art in using FPGAs for CNN acceleration, followed by the related R&D activities (outlined below) in the SHREC* Center at the University of Florida, based on which we will discuss the opportunities in heterogeneous computing for machine learning.
more » « less
Full Text Available
FaaM: FPGA-as-a-Microservice - A Case Study for Data Compression

https://doi.org/10.1051/epjconf/201921407029

Ojika, David; Gordon-Ross, Ann; Lam, Herman; Patel, Bhavesh; Forti, A.; Betev, L.; Litmaath, M.; Smirnova, O.; Hristov, P. (January 2019, EPJ Web of Conferences)

Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present an ongoing effort to realize a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multithreaded, multi-user compression as a microservice using the FaaM architecture indicate that microservices-based FPGA acceleration can sustain high-performance compared to straightforward implementation with minimal to no communication overhead despite the hardware abstraction.
more » « less
Full Text Available

Search for: All records